semantic lexicon
Curatr: A Platform for Semantic Analysis and Curation of Historical Literary Texts
Leavy, Susan, Meaney, Gerardine, Wade, Karen, Greene, Derek
The increasing availability of digital collections of historical and contemporary literature presents a wealth of possibilities for new research in the humanities. The scale and diversity of such collections however, presents particular challenges in identifying and extracting relevant content. This paper presents Curatr, an online platform for the exploration and curation of literature with machine learning-supported semantic search, designed within the context of digital humanities scholarship. The platform provides a text mining workflow that combines neural word embeddings with expert domain knowledge to enable the generation of thematic lexicons, allowing researches to curate relevant sub-corpora from a large corpus of 18th and 19th century digitised texts.
- Asia > Middle East > Israel (0.05)
- Europe > France (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.68)
- Health & Medicine > Epidemiology (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Communications > Social Media (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)
Semantic Network Model for Sign Language Comprehension
Kang, Xinchen, Yao, Dengfeng, Jiang, Minghu, Huang, Yunlong, Li, Fanshu
In this study, the authors propose a computational cognitive model for sign language (SL) perception and comprehension with detailed algorithmic descriptions based on cognitive functionalities in human language processing. The semantic network model (SNM) that represents semantic relations between concepts, it is used as a form of knowledge representation. The proposed model is applied in the comprehension of sign language for classifier predicates. The spreading activation search method is initiated by labeling a set of source nodes (e.g. concepts in the semantic network) with weights or "activation" and then iteratively propagating or "spreading" that activation out to other nodes linked to the source nodes. The results demonstrate that the proposed search method improves the performance of sign language comprehension in the SNM.
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Pennsylvania (0.04)
- (6 more...)
- Health & Medicine (1.00)
- Education > Curriculum > Subject-Specific Education (1.00)
ImmunoLingo: Linguistics-based formalization of the antibody language
Vu, Mai Ha, Robert, Philippe A., Akbar, Rahmad, Swiatczak, Bartlomiej, Sandve, Geir Kjetil, Haug, Dag Trygve Truslew, Greiff, Victor
Apparent parallels between natural language and biological sequence have led to a recent surge in the application of deep language models (LMs) to the analysis of antibody and other biological sequences. However, a lack of a rigorous linguistic formalization of biological sequence languages, which would define basic components, such as lexicon (i.e., the discrete units of the language) and grammar (i.e., the rules that link sequence well-formedness, structure, and meaning) has led to largely domain-unspecific applications of LMs, which do not take into account the underlying structure of the biological sequences studied. A linguistic formalization, on the other hand, establishes linguistically-informed and thus domain-adapted components for LM applications. It would facilitate a better understanding of how differences and similarities between natural language and biological sequences influence the quality of LMs, which is crucial for the design of interpretable models with extractable sequence-functions relationship rules, such as the ones underlying the antibody specificity prediction problem. Deciphering the rules of antibody specificity is crucial to accelerating rational and in silico biotherapeutic drug design. Here, we propose ImmunoLingo, a formalization of antibody language properties, and thereby establish not only a foundation for the application of linguistic tools in adaptive immune receptor analysis but also for the systematic immunolinguistic studies of immune receptor specificity in general. 2
- Europe > Norway > Eastern Norway > Oslo (0.05)
- Asia > China (0.04)
- North America > United States > New York (0.04)
- (4 more...)
Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons
Jo, Hwiyeol, Choi, Stanley Jungkyu
We propose post-processing method for enriching not only word representation but also its vector space using semantic lexicons, which we call extrofitting. The method consists of 3 steps as follows: (i) Expanding 1 or more dimension(s) on all the word vectors, filling with their representative value. (ii) Transferring semantic knowledge by averaging each representative values of synonyms and filling them in the expanded dimension(s). These two steps make representations of the synonyms close together. (iii) Projecting the vector space using Linear Discriminant Analysis, which eliminates the expanded dimension(s) with semantic knowledge. When experimenting with GloVe, we find that our method outperforms Faruqui's retrofitting on some of word similarity task. We also report further analysis on our method in respect to word vector dimensions, vocabulary size as well as other well-known pretrained word vectors (e.g., Word2Vec, Fasttext).
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.85)
What Happened? Leveraging VerbNet to Predict the Effects of Actions in Procedural Text
Clark, Peter, Dalvi, Bhavana, Tandon, Niket
Our goal is to answer questions about paragraphs describing processes (e.g., photosynthesis). Texts of this genre are challenging because the effects of actions are often implicit (unstated), requiring background knowledge and inference to reason about the changing world states. To supply this knowledge, we leverage VerbNet to build a rulebase (called the Semantic Lexicon) of the preconditions and effects of actions, and use it along with commonsense knowledge of persistence to answer questions about change. Our evaluation shows that our system, ProComp, significantly outperforms two strong reading comprehension (RC) baselines. Our contributions are two-fold: the Semantic Lexicon rulebase itself, and a demonstration of how a simulation-based approach to machine reading can outperform RC methods that rely on surface cues alone. Since this work was performed, we have developed neural systems that outperform ProComp, described elsewhere (Dalvi et al., NAACL'18). However, the Semantic Lexicon remains a novel and potentially useful resource, and its integration with neural systems remains a currently unexplored opportunity for further improvements in machine reading about processes.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Colorado (0.04)
A Retrospective on Mutual Bootstrapping
Riloff, Ellen (University of Utah) | Jones, Rosie (Microsoft)
When we were invited to write a retrospective article about our AAAI-99 paper on mutual bootstrapping (Riloff and Jones 1999), our first reaction was hesitation because, well, that algorithm seems old and clunky now. But upon reflection, it shaped a great deal of subsequent work on bootstrapped learning for natural language processing, both by ourselves and others. So our second reaction was enthusiasm, for the opportunity to think about the path from 1999 to 2017 and to share the lessons that we learned about bootstrapped learning along the way. This article begins with a brief history of related research that preceded and inspired the mutual bootstrapping work, to position it with respect to that period of time. We then describe the general ideas and approach behind the mutual bootstrapping algorithm. Next, we overview several types of research that have followed and shared similar themes: multi-view learning, bootstrapped lexicon induction, and bootstrapped pattern learning. Finally, we discuss some of the general lessons that we have learned about bootstrapping techniques for NLP to offer guidance to researchers and practitioners who may be interested in exploring these types of techniques in their own work.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- (8 more...)
Joint Word Representation Learning Using a Corpus and a Semantic Lexicon
Bollegala, Danushka (The University of Liverpool) | Alsuhaibani, Mohammed (The University of Liverpool) | Maehara, Takanori (Shizuoka University) | Kawarabayashi, Ken-ichi (National Institute of Informatics)
Methods for learning word representations using large text corpora have received much attention lately due to their impressive performancein numerous natural language processing (NLP) tasks such as, semantic similarity measurement, and word analogy detection.Despite their success, these data-driven word representation learning methods do not considerthe rich semantic relational structure between words in a co-occurring context. On the other hand, already much manual effort has gone into the construction of semantic lexicons such as the WordNetthat represent the meanings of words by defining the various relationships that exist among the words in a language.We consider the question, can we improve the word representations learnt using a corpora by integrating theknowledge from semantic lexicons?. For this purpose, we propose a joint word representation learning method that simultaneously predictsthe co-occurrences of two words in a sentence subject to the relational constrains given by the semantic lexicon.We use relations that exist between words in the lexicon to regularize the word representations learnt from the corpus.Our proposed method statistically significantly outperforms previously proposed methods for incorporating semantic lexicons into wordrepresentations on several benchmark datasets for semantic similarity and word analogy.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Honshū > Chūbu > Shizuoka Prefecture > Shizuoka (0.04)
- Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
Symbiotic Cognitive Computing through Iteratively Supervised Lexicon Induction
Alba, Alfredo (IBM Research) | Drews, Clemens (IBM Research) | Gruhl, Daniel (IBM Research) | Lewis, Neal (IBM Research) | Mendes, Pablo N. (IBM Research) | Nagarajan, Meenakshi (IBM Research) | Welch, Steve (IBM Research) | Coden, Anni (IBM Research) | Qadir, Ashequl (University of Utah)
In this paper we approach a subset of semantic analysis tasks through a symbiotic cognitive computing approach -- the user and the system learn from each other and accomplish the tasks better than they would do on their own. Our approach starts with a domain expert building a simplified domain model (e.g. semantic lexicons) and annotating documents with that model. The system helps the user by allowing them to obtain quicker results, and by leading them to refine their understanding of the domain. Meanwhile, through the feedback from the user, the system adapts more quickly and produces more accurate results. We believe this virtuous cycle is key for building next generation high quality semantic analysis systems. We present some preliminary findings and discuss our results on four aspects of this virtuous cycle, namely: the intrinsic incompleteness of semantic models, the need for a human in the loop, the benefits of a computer in the loop and finally the overall improvements offered by the human-computer interaction in the process.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
Semantic Lexicon Induction from Twitter with Pattern Relatedness and Flexible Term Length
Qadir, Ashequl ( University of Utah ) | Mendes, Pablo N. (IBM Research) | Gruhl, Daniel (IBM Research) | Lewis, Neal (IBM Research)
With the rise of social media, learning from informal text has become increasingly important. We present a novel semantic lexicon induction approach that is able to learn new vocabulary from social media. Our method is robust to the idiosyncrasies of informal and open-domain text corpora. Unlike previous work, it does not impose restrictions on the lexical features of candidate terms — e.g. by restricting entries to nouns or noun phrases —while still being able to accurately learn multiword phrases of variable length. Starting with a few seed terms for a semantic category, our method first explores the context around seed terms in a corpus, and identifies context patterns that are relevant to the category. These patterns are used to extract candidate terms — i.e. multiword segments that are further analyzed to ensure meaningful term boundary segmentation. We show that our approach is able to learn high quality semantic lexicons from informally written social media text of Twitter, and can achieve accuracy as high as 92% in the top 100 learned category members.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- (3 more...)
- Leisure & Entertainment > Sports (1.00)
- Leisure & Entertainment > Games (1.00)
- Automobiles & Trucks (0.93)
- (2 more...)
Acquiring Word-Meaning Mappings for Natural Language Interfaces
This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of phrases paired with meaning representations. WOLFIE is part of an integrated system that learns to transform sentences into representations such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The usefulness of the lexicons learned by WOLFIE are compared to those acquired by a similar system, with results favorable to WOLFIE. A second set of experiments demonstrates WOLFIE's ability to scale to larger and more difficult, albeit artificially generated, corpora. In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, most results to date for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to semantic lexicons. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > California > Alameda County > Berkeley (0.14)
- (20 more...)